Berkeley at NTCIR-2: Chinese, Japanese, and English IR experiments

نویسندگان

  • Aitao Chen
  • Fredric C. Gey
  • Hailing Jiang
چکیده

This paper reports on the work of Berkeley group at the second NTCIR workshop on Japanese & English IR and Chinese IR. A number of runs were submitted on all subtasks in the two main tasks. Our main focus on the Japanese monolingual subtask was on comparing the retrieval effectiveness of different segmentation methods. The experimental results show the bigram indexing outperformed the word-based indexing in Japanese monolingual retrieval. The bigram indexing was also highly effective in Chinese monolingual retrieval. This paper presents an alternative segmentation method that breaks text into one-character terms and two-character terms that do not overlap with each other, which overcomes the disadvantage of producing large index files by overlapping bigram indexing. This paper describes a technique for building bilingual word lexicons from parallel text by sentence alignment and word association. A purely rank-based document pooling strategy is presented for combining monolingual retrieval results in multilingual retrieval.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Experiments on Cross-language and Patent Retrieval at NTCIR-3 Workshop

The Berkeley group participated in the crosslanguage retrieval task and the patent retrieval task at the third NTCIR workshop. This paper describes our experiments on cross-language and patent retrieval. We present an automatic relevance feedback procedure for document ranking formula based on logistic regression, and a procedure for automatically extracting Chinese/Japanese translations of Eng...

متن کامل

How Similar are Chinese and Japanese for Cross-Language Information Retrieval?

For NTCIR Workshop 5 UC Berkeley participated in the bilingual task of the CLIR track. Our focus was on Chinese topic searches against the Japanese News document collection, and on Japanese topic search against the Chinese News Document Collection. Extending our work of NTCIR 4 workshop, we performed search experiments to segment and use Chinese search topics directly as if they were Japanese t...

متن کامل

Search Between Chinese and Japanese Text Collections

For NTCIR Workshop 6 UC Berkeley participated in Phase 1 of the bilingual task of the CLIR track. Our focus was upon Japanese topic search against the Chinese News Document Collection and upon Chinese topic searches retrieving from Japanese News document collection. We performed search experiments to segment and use Chinese search topics directly as if they were Japanese topics and vice versa. ...

متن کامل

Chinese and Korean Topic Search of Japanese News Collections

UC Berkeley participated in the pivot bilingual task of the CLIR track at NTCIR Workshop 4. Our focus was on Chinese and Korean searches against the Japanese News document collection, using English as a pivot language. For comparison of our pivot techniques, we submitted Japanese monolingual and English Japanese bilingual search rankings as well. Two different commercial translation software pa...

متن کامل

Report on CLIR Task for the NTCIR-5 Evaluation Campaign

This paper describes our second participation in an evaluation campaign involving the Chinese, Japanese, Korean and English languages (NTCIR-5). Our participation is motivated by four objectives: 1) study the retrieval performances of various IR models for these languages; 2) compare the relative retrieval effectiveness of bigram and automatic wordsegmenting approaches for Chinese and Japanese ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001